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Abstract 

In this paper we give a tliorough presentation of a model proposed by Tononi et al. for 
modeling integrated information, i.e. how much information is generated in a system tran- 
sitioning from one state to the next one by the causal interaction of its parts and above and 
beyond the information given by the sum of its parts. We also provides a more general for- 
mulation of such a model, independent from the time chosen for the analysis and from the 
uniformity of the probability distribution at the initial time instant. Finally, we prove that 
integrated information is null for disconnected systems. 

Keywords: integrated information, effective information, information theory, neural networks, proba- 
bilistic boolean networks. 

1 Introduction 

The term integrated information (denoted (p, for short) has been introduced by Giulio Tononi |j5j|6j[l0l to 
characterize the capacity of a system to integrate information acquired by its parts. Informally speaking, 
the integrated information owned by a system in a given state can be described as the information (in the 
Theory of Information sense) generated by a system in the transition from one given state to the next one 
as a consequence of the causal interaction of its parts above and beyond the sum of information generated 
independently by each of its parts. 

Such a theory was first introduced as a linear model ll6ll9]-[T2l. then reformulated as a discrete one m 
|2j|71 and was aimed at trying to formally capture what is consciousness in living beings ElTllll. Its 
description is not always clear from a mathematical point of view, and to best of our knowledge this is 
the first formal description where all steps of the model are presented in detail using the framework of 
probabilistic boolean networks. 

In our presentation we also provides a more general formulation of the model, which can be used for 
analyzing the system at a generic time instant, and which does not require the assumption of uniformity 
of the probability distribution at the initial time instant. 

We also formally prove here, for the first time in the literature to the best of our knowledge, that 
integrated information is null for a disconnected system, that is a system made up by independent com- 
ponents. 
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The characterization of integrated information is based on another concept, always defined by Tononi 
and coauthors, named effective information and modeling how much information is gained by an external 
observer on the previous state of a system from checking which is its current state, with respect to what 
can "a priori" be deduced on the previous state from the known dynamics of the system itself. Given 
this emphasis on the experimental side of the knowledge acquisition process, we suggest here to use the 
terms "experimental information" or "Galileian information" as synonyms for "effective information". 

Effective information is zero for static systems or uniformly random systems, which is consistent 
with everyday scientist's experience. And, similarly, integrated information is also zero for disconnected 
systems, independently from their kind. 

2 Probabilistic Boolean Networks 

Let X = {V, E) be a directed graph with n boolean nodes, i.e. taking values in {0, 1}. The value taken 
by a node is called also its state. Edge (u, v) £ E models the fact that node v gets in input the state of 
u. We assume time runs in discrete steps or instants, and nodes may change their value with the flow of 
time depending on (the value of) the states of their input nodes. 

Temporal evolution of state of node i is given by a law /j : {0, 1}"' {0, 1} computing state of i 
at the next time instant as a function only of the current state of its n-i < n input nodes. Self loops are 
admitted. Nodes can all have the same law / or each node can have its specific law. In any case laws are 
constant with time. 

We call X as defined above a Deterministic Boolean Network. To put things into context. Random 
Boolean Networks have been defined in the literature since many years, differing from the deterministic 
version only in the fact that each /j is randomly chosen when building the network. Random boolean 
networks have been widely studied as model for gene expression in biological systems. 

Various probabilistic versions of Boolean Networks have also been defined, different from ours, for 
example H , where each node at each time instant randomly chooses, according to a given probability 
distribution, the law to be used from a finite domain of admissible laws. 

Our version of Probabilistic Boolean Network (PBN, for short) assumes the probabilistic law : 
{0, 1}"' — )• [0, 1] associated to node i provides for each configuration of the states of the rij input nodes 
the probability rj that at the next time instant node i has (equivalently, is in) state 1 (being then 1 — rj the 
probability i is in state 0). It can be shown that this model can describe every network defined according 
to the model introduced in lH. In the following we use interchangeably the terms system and network. 

At each time instant t a PBN can be in any of its 2" states, we assume are provided of some arbitrary 
enumeration {xi}. State of network X at time t is denoted Xt. A PBN can also be considered as a 
Markov chain with a finite space state. 
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A PEN is completely described by its state transition matrix S, whose elements Sij are: 

Sij = piXt+i = Xj I Xt = Xi) 

that is, element Sij is the probability that at time t + 1 the network is in state xj conditioned to the fact that 
at time t the network was in state Xj. Note that since the probabiUstic law associated to each node is time 
constant, state transition matrix S is also time constant, hence we can speak of an homogeneous Markov 
chain. A square matrix of real numbers is a state transition matrix if < Sij < 1 e J27=i ~ ^• 

Values of sij can be easily computed by means of the values for each node k as it follows. Let 
i = cT„(T„_i . . . (7i be the bit string representing the network state at instant t, where ak represent state 
of node k at instant t. The network state at the next instant t + 1 is j = . . .a'l where cr^ is the 

state of node k computed by law for instant t + 1. It is = 1 with probability rk{cTn<7n-i • • • ci) and 
(Tj[. = with probability 1 — rk{cTnCrn-i ■ ■ ■ ci). Then 



n 



Pk 



k=l 

where pk = rfc(a„(j„_i . . . ai) if = 1 and = 1 - rfc(<TnCT„_i ...ai) if a'^ = 0. 

Let us denote with pt{i) = p{Xt = Xi) probability that network is in state xi at instant t. State distribu- 
tion probability at t + 1 is given by: 



2" 
3=i 



Note that, even if S is time constant (i.e., stationary), state probabihty distribution is not necessarily so. 
Let pt be the row vector with elements pt(z). Previous formula can be written in a matrix form as 

Pt+i = Pi • -S" 
and, denoting with the i-th column of S, it is 

Pt+i{i) = Pt ■ S' 

If for some t it is pt+i(-) = pt( ) then we say the network is in the stationary regime. It is then 

p = pS 

that is p is an eigenvector of S with eigenvalue 1. Note that not every eigenvector of S can be a stationary 
probability distribution, since it has to fulfill probability distribution constraints. For example, the null 
eigenvector is never a stationary probability distribution. 

Row Si of the state transition matrix provides the conditional probability distribution p{Xt^i \ Xt = 
Xi) describing network state at the instant next to the one the network is in state Xj. 
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Network dynamics can also be analyzed backwards in time. Let us assume that we have observed or 
measured that network at instant t is in a given state. We can then compute state distribution probabihty 
for instant t — 1, that is we can compute the law by which states at instant t — 1 might have caused the 
state actually observed or measured at instant t. This is provided by defining a state backward-transition 
matrix B, describing probabilities obtained inverting through Bayes rule the relations between events. 
Its elements hij are: 

bij{t) = p{Xt-i = Xj I Xt = Xi) 

that can be written as 

bij{t) 



p{Xt-i = Xj, Xt = Xi] 



p{Xt = Xi) 
and applying again Bayes rule we have 



_ pjXt = Xj I Xt-i = Xj)p{Xt-i = Xj) _ Sjip{Xt-i = Xj) _ pt-i{j)sji _ pt-i{j)sji 
p{Xt = x^) ~ p{Xt = Xi) ~ pt(i) ~ Pt-i-S' 



bij{t) = 



If at instant t — 1 state probability distribution is uniform then last formula becomes 



bi,{t) = (1) 



Note that if state probability distribution is uniform then state backward-transition matrix S is a kind of 
transpose of the state transition matrix S. Note also that while S is time constant, B is not so, in general. 

Row Bi{t) of the state backward-transition matrix B provides the conditional probability distribution 
p{Xt-i \Xt = Xi) describing network state at the instant previous to the one the network is in state Xi. 



3 Effective Information 
3.1 Introduction 

Effective information can be informally described as the quantity of information on possible predecessors 
of current states acquired additionally from actually measuring the current network state with respect to 
what can be acquired from the knowledge of state transition matrix only. We propose calling it exper- 
imental information or Galileian information, given the emphasis it gives to experimentally acquired 
knowledge with respect to purely theoretical knowledge. Here quantity of information is intended in the 
standard sense of the Shannon's Information Theory. 

The main question effective informations answers to is: if network observation finds that its current state 
is Xi, which is the additional knowledge provided by this measure with respect to what can be known 
on the network by its state transition matrix only, i.e. without knowing which is the current state of the 
network? 

Still remaining at the informal level this additional knowledge can be described as the reduction in 
uncertainty provided by the actual measurement with respect to the uncertainty existing on the basis of 
the state transition matrix only. 
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On one side there are those systems whose regime trajectory in the space state is a deterministic 
cycle. For such systems the observation provides an effective information of log2 k bitil] (where k is the 
number of the nodes on the cycle, i.e. its length). Since a deterministic closed trajectory of length k in 
the state space corresponds to a suitable subset of k rows of the state transition matrix each containing 
exactly one value 1, and since before measuring the system the uncertainty is maximum - given that the 
system can be in any of these k states - while after measuring the systems it is univocally known the 
predecessor of the current state, the information acquired through observation is maximum and equal, 
according to the standard way of measuring information, to log k bits. 

On the other side there are those systems whose behavior in the state space is uniformly random, 
that is those systems where each state can be, with equal probability, the predecessor of the current state. 
Measuring the actual current state in these systems provides an effective information of bits since no 
reduction in uncertainty is provided through the observation (complete uncertainty both before and after 
the measurement). Also for completely static systems, that is systems whose state is constant while time 
runs there is no reduction in uncertainty provided through the observation (no uncertainty either before 
or after the measurement). 

3.2 Formal definition 

We define the effective information obtained by observing that system X is in state Xi at instant t as 

ei{t,x^) = DKL{Bi{t)\\Xt-i) (2) 
where Dkl is the KuUback-Leibler divergence]^. Then 

= -H{B,{t)) - Y, l>,j{t)]ogp{X,^i = Xj) 

j 

Our definition is a generalization of the one provided by Tononi and coauthors (cfr. equations lA and 
IB of HI). Ours in fact allows to study system behavior for each time instant and for each probability 
distribution Xq, while in HI the time instant under investigation is always t = I and it is always assumed 
probability distribution Xq is the uniform one. Our formulation hence allows to model both the transient 
and the stationary regime of a system. 

For the case when the state probability distribution Xt-i is uniform the formula above becomes: 

eiit,x,) = -H{Bi{t))-Y,hj{t)\og^ 

3 

= -H{Bi{t))+nY,hij{t) 
j 

= n-H{Bi{t)) 
' from now on all logarithms are to the base 2 

^The KuUback-Leibler divergence (or distance) of probability distribution q{x) from probability distribution p{x) is defined 
as DKL{p\\q) = E^en, Pi^) log f(fy = (log f(fy) and note it is asymmetric. 
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Effective information in the regime phase of a system is provided by considering equation (|2]) in the limit 
for the instant t tending to infinity 

ei{xi) = DKL{Bi \ \Xoo) 

where Bi ed Xoo are the stationary probability distributions defined by the limits, if they exist, of the 
probability distributions for instant t, which describe the regime phase of the system. That is: 

p{Xoo = Xi) = lim p{Xt = Xi) = Pi 

and 

p{Bi = Xj) = p{Xoo = Xj\Xoo = Xi) = 

Pi 

hence 

ei{xi) = ^ bij log , = -H{B,) - ^ 6y logpj 

A system which has a uniformly random behavior in the regime phase has H{Bi) = n, since state 
probability distribution p{Xt-i\Xt) is p{xj) = i^, hence 

ei(xi) = —n — — log — = — — n = n — n = 

3 j 

A system completely static in the regime phase, i.e. which remains fixed in a single attraction state Xj, 
has H{Bi) = since the unique possible predecessor is Xi itself and p{xj) = if i / j from which we 
have 

ei{xi) = log 1 = 

Note that sum is computed only on observable states (i.e. where p{xj) ^ 0), to avoid the undeterminate 
form log 

A system having in the regime phase a single cychc attractor containing all states, i.e. a deterministic 
closed trajectory in the space state walking through all states, has H{Bj) = since each state has exactly 
one predecessor while p{xj ) = and hence 

ei{xi) = - log ^ = n 

The same holds, assuming the stationary state space distribution is uniform, when the system has more 
cyclic attractors partitioning all the space state. 

If the system has a single cyclic attractor with k < 2"' states (or more cyclic attractors partitioning a 
subset of size k < 2" of all states, still assuming a uniform stationary state space distribution) then it is 

ei{xi) = log k. 

The analysis in yj assumes the maximum uncertainty and uniformity on the initial systems conditions 
and is focused on computing effective information in the instant right after the initial state. The formula- 
tion of effective information in fl] is therefore the following particular case of ours: 



eii{xi) = DKL{Bi{l)\\Xo) 
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Note also that since for this particular case the assumptions used for the derivation of ([U hold, it can be 
written 

3.3 Effective information of subsets 

For the definition of integrated information it is required to define how to measure effective information 
for subsets of a given network X. Let A X. When X is in state Xj we denote with TTA{xi) = the 
state of A. Let At be the random variable representing state of A at instant t. We can define for A state 
transition matrix and state backward-transition matrix in analogy with the general case as 

^Sij = p{At+i = aj \At = ai) 

and 

^6ij(t) = p{At-i = aj I At = ai) 

Both can be obtained from S e p{-) after some long but straightforward computations. Intuitively and 
informally speaking, the computation is based on summing transition probabilities over all states of X 
which are equivalent with respect to subset A, averaged with their state probabilities. 

Now, all definitions introduced for a network X can be applied to any of its subset of nodes A by 
substituting in the previous formulas S, B, and X respectively with "^S, "^B, and A. We then obtain 

eiit,A,ah) = DKU'^Bhit)Ut-i) (3) 



4 Integrated Information 

We are now ready to formally define integrated information, that is the quantity of information generated 
in a system transitioning from one state to the next by the causal interaction of its parts, above and beyond 
the quantity of information generated independently by each of its parts. 

Given a system X\etV<^X and {Af^} a partition of 1/ in m subsets. Let Mf^{t) be the random 
variables describing the state of the A;-th component of the partition at instant t. Let X be in state Xi 
at instant t. Then V at the same instant is in state ^Xi and the A:-th component is in state ^^'^Xj. In the 
following we use Vh and fij. as a shorthand for ^Xi and *^'=3;j, respectively. 

Partition-dependent integrated information is first defined for a subset F as a function of partition 
{Mfc}, time instant t, and current state Vh as 

m 

(t>{t, V, {Mk],Vh) = ei{t, V, Vh) - ei{t, Mk,^k) (4) 

fc=i 

Value computed by this formula clearly depends on the considered partition. Tipically, an unbalanced 
partition produces a lower value of cp (see UJ). Hence the following normalization function is introduced 

N{t, V, {Mfc}, Vh) = (m - 1) mm{H{Mk{t))} 
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Then, the Minimum Information Partition (MIP) is defined as the partition providing the minimum value 
for the integrated information after the normalization process, that is 



The above formula has been defined by Tononi for generic partitions, but in all of its papers and here it 
is only discussed the case of bi-partitions, i.e. partitions in two subsets. 

Integrated information cf) for subset V, in state Vh at instant t, is now formally defined as the value of the 
partition-dependent integrated information computed on MIP, that is 



And it is now possible to formally define the value of integrated information for the whole system X. A 
subset y C X having > is called complex. If it is not a proper subset of another subset with a larger 
(j) it is called main complex. The value of integrated information of X, in state xi at instant t, is defined 
as the value of integrated information of its main complex of maximum value. 



The value of integrated information averaged over all states of the system is provided through the state 
distribution probability pt{-), that is 



5 Integrated information in disconnected systems 

Intuitively, any system having a partition in two independent subsets, i.e. that can be partitioned in two 
subsets such that no node in a subset affects the state value of nodes in the other subset, should have zero 
as value of its integrated information. 

We now give a formal proof of this property, to the best of our knowledge never appeared in the 
literature. We consider the value of integrated information assuming at instant t — 1 the system has a 
uniform state probability distribution, consistently with discussion in HI. Remember that for a subset V 
of the system X in state Xh we use Vh as a shorthand for ^Xh, the restriction of Xh to nodes in V. 

Theorem 1 (Integrated information in a disconnected network) Let A' and A" be two disjoint sub- 
sets of a network X, A' U A" = V <Z X. Let us denote with the current state ofV, and with e 
the current states of subsets A' and A", respectively. 
For each state and time instant t it is 




(Pit, V, Vh) = V, PMIp{t, V, Vh),Vh) 



(pit, Xi) = max (pit, V, PMipit, V, Vh),Vh) 



(Pit,V,{A',A"},VH) = 
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Proof. From the definition ^ of partition-dependent integrated information and the definition Q of the 
effective information for a subset it is 

(l){t,V,{A',A"},Vh) = ei{t,V,Vh) -ei{t,A',a'f^) -ei{t,A",a'/,) 

= DKiC'Bhit) II Vt-i) - DKLi^'Biit) II - DKL{^"B,{t) \\AU15) 

From the definition of the KuIIback-Leibler divergence it is 

DKL^Bhit) \\Vt-i) = -H{^Bh{t)) - ^hhjit) \ogp{Vt-i = vj) 

j 

Remember that ^Bh{t) is a conditional probability distribution for the state preceding the current one 

p{^Bh{t) = Vj) = p{Vt^i=Vj\Vt = Vh) 

= p{A',_^ = a;. A A'U = < I Vt = Vh) 

Applying the chain rule of entropy it is 

H^Bi^it)) = HiA[_, I Vt = Vh) + h(^{A'U I Vt = Vh)\A't_, 

and given the independence between A" and A' it follows that 

HC'Bhit)) = H{A^^^\Vt = Vh)+H{A'U\yt = Vh) 

= HiA',^, I A', = ah') + HiAU I ^" = a^") 
= H{^'Bh,{t)) + H{^"Bj,.{t)) 

From the assumption of uniform state probability distribution at t — 1 it is 

DKLCBh{t)\\Vt-i) = -HCBh{t)) + \V\ 
DKL{^'Bh{t)\\A',_,) = \A'\-H{^'Bh{t)) 
DKL{^"B,,{t)\\AU) = \A"\-H{''"Bhit)) 

and substituting the above right members for the left ones in equation ^ and considering that \V\ = 
I ^'1 + I A" I we obtain 

^{t,V,{A',A"},Vh) = \V\ - \A'\ - \A"\-HCBH{t))+H{^'BH'{t)) + H{^"Bh"{t)) = 

□ 



6 Conclusions 

In this paper we have given a thorough presentation of a model proposed by Giulio Tononi ll5ll6l[T0l 
for modeling integrated information, i.e. how much information is generated in a system by causal 
interaction of its parts and above and beyond the information given by the sum of its parts. The model 
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was aimed at trying to formally capture what is consciousness in living beings |[3l|7l[8l and the reader is 
referred to Tononi's papers for detailed motivations of the model. 

We have considered the discrete version of the model |IIl|2j|71. The original papers describing the model 
are not always fully clear in their mathematical formulation and here we have given the first formal 
description of such a model where all steps are detailed presented. 

In doing so we have provided a more general formulation of such a model, which is independent 
from the time chosen for the analysis and from the uniformity of the probability distribution at the initial 
time instant. 

Finally, we have also given here the first formal proof that a system made up by independent parts 
has a value of integrated information equal to zero. 

Acknowledgments. We would like to thank Luciano Guala and Guido Proietti for useful and interest- 
ing discussions related to the work here described. 
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